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^ ! Abstract 

Ph ' We calculate the density and expectation for the number of hneages in a 



o 



X 



reconstructed tree with n extant species. This is done with conditioning on the 
age of the tree as weh as with assuming a uniform prior for the age of the tree. 



> 

S ; 1 Introduction 
(N 
O 



There are a variety of methods to extract information relevant to macro-evolutionary 
process from phylogenies (e.g. imbalance [e.g. Heard 1992]; Gamma [e.g. Pybus and 
Harvey 2001]). One popular approach are lineages through time plots (LTT plots). 
An LTT plot is a plot of lineage accumulation through time translated from a dated 
phylogeny (Nee et al. 1992). An LTT plot can be used to test the plausibility of a model 
of macroevolution for any particular clade, the LTT plot for the clade of interest can be 
compared to an expectation generated from a model (for a review of such models see 
Mooers et al. 2007 [book chapter], or Hartmann et al. in press). The expected shape can 
be obtained either through simulation or analytical approaches. Simulations are a simple 
and therefore attractive approach to developing models of macroevolution, but their 
use can be trecherous (see Hartmann et al) for an illuminating discussion. Analytical 
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approaches offer a computational advantage over simulation, but even simple models 
quite challenging to analyze. Here, we add to the knowledge base of analytical 



are 



approaches with respect to LTT plots. 



We consider constant rate birth and death processes ((Feller, 1968 ). The birth rate 



is A, the death rate is /i. We define p := /i/A and 5 := A — /i. Constant rate birth and 
death processes are a popular null model ... bla bla bla 

Birth and death process is conditioned such that we obtain n species today. A tree 
with both extinct and extant species is a complete tree, while a reconstructed tree is 
the complete tree where all lineages are removed. 



We will need the following functions as defined in 



Nee et al. 



Pit) 

uit) 



X — H 



A-/xe-(^-/^)*' 

1 _ g-(A-^)t 



A- 



A 



fie' 



(1) 

(2) 



2 LTT plots for trees of known age 



In a lineage-through-time (LTT) plot , we plot the tim e vs. the number of species at 



Nee et al. 



(119941 ) the expected LTT plot is given 



that time. For a reconstructed tree, in 
analytical after a time t. 

However, when analyzing the data, we have trees on a given number of species, n. 
The aim of this section is to calculate the density and expectation for the number of 
species at time at, a G {0, 1} in a reconstructed tree at time t after origin. We call 
this random variable Mg-^. We condition M^^t on having Mi j = n, i.e. having n species 
today. 



Theorem 2.1. Let today he time t and assume we have n species today. The probability 
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that at time at we have m species in the reconstructed tree is 



n\ 



(3) 



else 



with f{a,t,p,S) = (1 - p) (i-e-(i-3-t^)(r-pe--^*) • 

Proof. Since we are considering reconstructed trees, we obviouly have f[M„^t 
m\Mi^t = n] = if m > n. For m < n, we have with Bayes' law, 



P[M,,t = m\Mi,t = n]= P[Mi,t = n\M^,t = m 



P[Mi,t = n] 



(4) 



The probabihty that a hneage in the reconstructed tree at time at has m descendants 
today, at time t, is 



p^(a,m) = {I - u{{l ~ a)t))u{{l - a)t) 



m— 1 



which is estabhshed in iNee et aLl (119941 ). Equation (4). Therefore, with 
TZ=nAMi,t = n\M,^t = m], and e = (1, 1, . . . , 1)^, we get 



^ m 

P[Mi,i = n|M.,t = m] = ^ 



jgN k=l 
i'^e=n 



jgN fc=l 



1 5^ (l-^(l-a)t)r^(l-a)tr 



-|{^ G N : z^e = n}|(l - ^.((1 - - a)ty 



3 



We determine |{z G N : i'^e = n}\. For every component of i, we have ik ^ = 
1, . . . , m. So we have to count in how many ways we can distribute the remaining n — m 
ones to the m components. Distibuting the n — m ones to m components is equivalent 
to drawing n — m times from a urn with m different balls and returning the balls to the 
urn after a drawing. From combinatorics, we know that there are = ("~\) 

different outcomes. So | {i G N : = ''^j I = (m-i) ■ Therefore, 



In iNee et al.l ( 1l994j ). the authors establish (Equation (9) and (3)) 



P[M.,t = m] = ( 1 - u{at)^^^ {u{at)-^^^^ ' 



P{ot) ) V P^o-^) 
P[Mi,t = n] = P{t){\-u{t))u{tY-^. 

Plugging these equations into Equation (jl]) yields 

P[M,,t = m] 



m—1 



N\m-lJ^ / // vv // P{t){l - u{t))u{t) 



n— 1 



P(t)(l 

= ^ 1 h^^^ TTT uf^l^T^T 9a,t,n 

N\m — lJ \ u{{l — a)t) P[crt) J 

(l u{crt) ^'^^^ ) 

where 5f^,t,n = «((! - cr)t)"-Hl - u{{l - (^)^)) pp){T^^^(t)S§^' following, we 

determine A^. Since probabilities add up to 1, we have X]m=i ^["^o-,* — ^\^i,t = n] = 1. 
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We have with the binomial theorem, 



Therefore, 



n-1 



We evaluate 



^^^^^ l - u{{l - a)t) Pit) _ (l_e-(^-/^V*)e-(^-'^)«^-'^)*) 



u{{l~a)t) P{at) ' '^'(l-e-(^-A')((i--)*))(A-Aie-(^-^)*) 
with P{t) and u(t) from Equation ([1]) and ([2]). So 

/(a,t,p,5) :=M(at) ^^^^ ' ' J' = {I - p)- ^ ' 



u{{l-a)t) P{at) ' '^'(l-e-(i--)'5*)(l-pe-'5*)' 
Therefore, 



m 



i;(l + /(a,t,p,5)) 



n-l 



which establishes the theorem. 



□ 



Remark 2.2. Note that /(a, t, p, 5) = f{a, 6t, p, 1). Therefore, the conditional distribu- 
tion P[Mo-,t = m\Mi^t = n] with parameters p, 5 is the same as F[Mcr^st = ""^l^i.^t = n] 
with parameters p, 1. 
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Corollary 2.3. The expectation of M^^t given Mi^t = n is 

l + nf{a,t,p,6) 



l + f{a,t,p,6) 



Proof. From Theorem 12. we get 



m=l 

[1 + f{a,t,p,6)) \ m J 



n-l 



m=0 

n—1 , 

(/(a,t,p,5) + ir"^ + ^mr 

ra=\ ^ 



T) — 1 



m— 1 



(l + /(a,t,p,5)) 

(n- l)/(a,t,p,(5) - 2 \ 

^ (n-l)/(a,t,p,,5)(/(a,t,p,^) + l)"-^ 
(l + /(a,t,p,5)r^ 
^ 1 + n/(a,t,p,(5) 
l + /(a,t,p,5) 

which establishes the corollary. □ 

Note that for a fixed n, the conditional expectation '¥\Mf,^t\M\,t = f^] only depends 
on p and 6t. For p = 0, 1/4, 1/2, 3/4, 1, t = 10 and varying values of 6, we calculated 
the expectation, see Figure [TJ The graph looks quite unfamiliar for an LTT plot of a 
reconstructed tree since we have concave curves, and the Yule model is for large A more 
convex than models with extinction. 

This has the following reason. Consider the curves for arbitrary A and p = 0. We 
condition on the age t of the tree. If A is very large, i.e. the process will have more than 
n lineages at time t with high probability (when not conditioning on n), then the most 
likely trees with n species are the trees where nothing happens at the beginning, and 



6 



later we have speciation. If lots of speciation would happen at the beginning, we would 
later allow all those lineages only speciate very rarely, since we want to end up with n 
species. This is very unlikely though, since A is big. If at the beginning, the one lineage 
does not speciate, and after a while, we would have "normal" speciation, this is much 
more likely, since we only force the first lineages to behave unnormal. This yields a very 
convex LTT plot. 

In the case of A being small compared to t, we need the early lineages to speciate a 
lot. Then the later lineages can behave quite normal in order to end up with n lineages 
today. 



2.5 




time (scaled) 

Figure 1: Expected number of species given we have n = 10 species today at time t = 10. 
We calculated for A = 5, 2, 1, 0.5, 0.2, 0.1, 0.01, from bottom to top. The different colours 
correspond to green: p = 0, yellow: p = 1/4, blue: p = 1/2, red: p = 3/4, black: p = 1. 
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2.1 Conditioning on the most recent common ancestor 

So far, we condition on the time of the origin of our tree. In other situations, we might 
know the time of the mrca of the extant species opposed to the time of origin. 

Let M^l^'^ be the random variable 'number of hneages in reconstructed tree at time 
at given the time since the mrca is t\ 

Corollary 2.4. For M^['^"', we have the following conditional density, 



1 /(a,t,p,^)'»-^ ^^Vfc-A (n-k 
n - 1 (1 + /(a, t, p, l^^2-.^\l-l)\m-l 



form<n and P[M^;™ = mlM^r" = n] = otherwise. 

Proof. Reconstructed trees under the constant rate b i rth an d death process with n 



leaves have the same distribution as Yule trees (Aldous 



200 ll ). The two daughter trees 



are denoted by , 7^ , they are trees with origin at the mrca and tog ether they have n 



leaves. The probability that has k leaves (fc = 1, 2, . . . — 1) is (ISlowinski 
Therefore, 



1990|). 



mrca 

(T,t 

n-1 



n 



n 



^— P[^™r = m\M[^ = k, = n-k] 

k=l 
^ n—1 m—1 

— J] J]P[M5 = /,m3 = m - /|m5 = fc,M5 =n-k] 

k=l 1=1 
^ n—1 m—1 



k=l 1=1 



f{(^,t,p,S) 



m—1 



n—1 m—1 



n ~ 1(1 + f{a,t,p, 6) 



n-l EE 



k=l 1=1 



k — 1\ fn — k 



I — \ j \m — I 



which establishes the theorem. 



□ 



Corollary 2.5. For M^^l^"- , we have the following conditional expectation, 



e[m; 



2 + n/(a,t) 



Proof. The two daugther trees 7i,72 of the mrca are trees which have their origin 
at the mrca, together the y have n leaves. Since the probability of Ti having k leaves 



(A; = 1, 2, ... — 1) is (ISlowinski 



1990l ). we have 



n 



n 



^ 5^[E[M,,t|Mi,t = k]+ E[M,,t|Mi,t = n - A;]] 

k=l 

^^^f^^{n-2)f{a,t)- 



2 + nfia,t) 
l + f{a,t) 



which completes the proof. 



□ 



The foi: 



NeeetaL 



owing result had already been established in a completely different way in 



(119941 ). This verifies that our calculations are correct, since we end up with 



the same result as 



Neeetal 



fll994h . 



Corollary 2.6. The expected number of species at time at condition the process survives 
until t is 



E[M,,t|Mi,t>0] = e 



(A — /t)(Tt 



9 



Proof. We can write the expectation as 



n=l 



n=l 



(A - ^)e-(^-^)* ^ ^A(l - e-^^-^^') 



(1 + f{at)){\ - /ie-(^-^)*) ^ V ^ - Ate-(^-'^)* 



(A - /x)e-(^-'^)*/M) ^ /A(l-e-(^-^)*:^ 



(1 + /((7i))(A - /xe-M*) ^ V ^ - Ate-C^-/^)* 
(A - /i)e-(^-^)* 1 
(1 + f{at)){X - /.e-M*) 1 _ ^i^pgS 

^ (A - /x)e-(^-'^)*/(crt) (A - /xe-(^-'^)*)2 d 



1 + /(at)) (A - ^e-(^-^')*) A(A - /x)2e-M* c^t I i _ Mi-'. '^-^)') 
1 /(at)(A - ^e-(^-^)*) 



which estabhshes the corollary. □ 



3 LTT plots for trees of unknown age 

So far, we assumed that the time since origin is known to be t. We then calculate the 
expected number of species for each point in time between the origin and today. 

The fact that the time of origin is known, but nothing about the timing after that 
seems a bit artifical to me. Aldous/Popovic assumed that any point of time in the past 
is equally likely to be the point of origin of a tree. Conditioning on n species than 
gives the distribution qor{t) for the time of origin. I was wondering if we want to write 
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something why it is plausible to use the uniform assumption!?! 

If the age of the tree is unknown, we can assu me that the age distr i bution is uniform 



on [0, o o). This prior has been assumed before in 



Aldous and Popovic 



20051) 



Gernhard 



We will need the following theorem from 



GernhardI fl2007al ) 



Theorem 3.1. Lettor be the time of origin of a tree. Letqor{t) be the density function of 
tor- Our prior is the uniform distribution of the time of origin on [0,oo). Conditioning 
the tree on having n species today, we obtain the following density function for the time 
of origin of the tree, 



qor(t\n) = n\"{\ - /i) 



^[1 — g-(^-M)ty"-ig-(A-M)* 



(A - /ie-(^-/^)*)" 



+1 



Let Mo- be the random variable 'number of lineages in reconstructed tree when the 
fraction cr of the time until today is over'. We obtain 

Remark 3.2. The probability for m lineages at time pt given n species at time t is 



P[Mo = m\Mi = n]= P[Mo,t = m|Mi,t = n]qor{t\n)dt. 

'o 



We did not find an analytic expression for that integral. 
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For the expectation, we get 



E[M,|Mi=n] (5) 

E[M„^t\M^^t = n]F[t\n]dt 







oo 



= 1 + 



/•oo 

/ (^-1)- 

^0 



(1 - e-(^-'^)*)(A - /ie-(^-^)(i-'^)*) ■ ■ ■ 



(-1 _ p-iX-fi)t\n-l -{\-n)t\ 

^ ' (A - ;ue-(^-^)*)"+i J 
= 1 + n(n - 1)A"(A - / — ^ -\ , , d<6) 

/-oo -(2-(T)t _ -2t Ci _ -t\n-2 

^tt l + n(n-l)A-(A-/x)M n v A n (7) 



l + n(n-l)i-(A-,)^/ 



= l + nn-l)p-l / ^^7^; TT^dt 8) 

Note that the expectation only depends on p = A///. In general, we could not find an 
analytical solution for the integral. The expected LTT plots are drawn via numerical 
integration. However, for the Yule model, = 0, we can evaluate the integral. Prom 
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Equation ([7]), we get 



1 + n{n - 1) / 
Jo 

n-2 

1 + n{n - 1) 

fc=0 
n-2 

1 + n(n - 1) ^ 

fc=0 
n-2 

1 + n{n - 1) 

fc=0 



-{2-a-)t _ -2t 



n-2 
k 

n-2 
k 



Jo 



1 g-(fc+2-<7)t _^ 1 



-(fc+2)t 



k + 2-(r 



k + 2 



n-2\ (-1) 



k J k + 2 k + 2-a 



For the critical branching process, i.e. A = /i, we observe with the property e ~ 1 — e 
for e — >^ 0, from Equation IQ, 



EcBp[M^\Mi = n] 
hm I 1 + n{n — 1) 



1 + n{n — 1) 
1 + n{n — l)a 



A"(A-;u)3((A-/i)(Tt)((A-/i)t) 



n-2 



(A - ^(1 - (A - ^)(1 - amx - /.(I - (A - 



(l + A(l-a)t)(l + At)-+i 



(l + (l-a)t)(l + t) 



This estabhshes the following theorem. 



Theorem 3.3. The expectation of M„ given Mi = n is 



i + -(--i)E^:S(V)^ 

1 + n{n — l)cr 



E[M^|Mi =n] = < 



k+2 k+2-a 



(l+(l-CT)t)(l+t)"+l 



dt 



if H = \ 



l + n{n-l){p-l)^^^- 



-pe-Ci-"^)' (l-pe-«)"^ 



-(it else 



(9) 



Note that Ey^i^e and Ec^p are independent of A. The conditioned expectation for 
Mp was calculated for different values of p, see Figure |2j The integration was done with 
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the Matlab ode45 tool. 



2.5 




time (scaled) 

Figure 2: Expected number of species given we have n = 10 species today. According to 
EquationlHl the expectation only depends on p = A//x, we calculated p = 1/4, 1/2, 3/4, 1 
and the Yule model (from bottom to top). The upper black line is the straight line. 
Note that the Yule model is more convex than the straight line. 
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